A hierarchical network heuristic for solving the orientation problem in genome assembly

نویسندگان

  • Karl R. B. Schmitt
  • Aleksey V. Zimin
  • Guillaume Marcaccs
  • James A. Yorke
  • Michelle Girvan
چکیده

In the past several years, the problem of genome assembly has received considerable attention from both biologists and computer scientists. An important component of current assembly methods is the scaffolding process. This process involves building ordered and oriented linear collections of contigs (continuous overlapping sequence reads) called scaffolds and relies on the use of mate pair data. A mate pair is a set of two reads that are sequenced from the ends of a single fragment of DNA, and therefore have opposite mutual orientations. When two reads of a mate-pair are placed into two different contigs, one can infer the mutual orientation of these contigs. While several orientation algorithms exist as part of assembly programs, all encounter challenges while solving the orientation problem due to errors from mis-assemblies in contigs or errors in read placements. In this paper we present an algorithm based on hierarchical clustering that independently solves the orientation problem and is robust to errors. We show that our algorithm can correctly solve the orientation problem for both faux (generated) assembly data and real assembly data for R. sphaeroides bacteria. We demonstrate that our algorithm is stable to both changes in the initial orientations as well as noise in the data, making it advantageous compared to traditional approaches. Author Summary Constructing an organism’s entire DNA sequence from raw genome sequencing data, like the data produced in the Human Genome Project, is a challenging task. The type of data generated in the sequencing process has changed substantially over the years as a result of various technological improvements. The computer programs that convert such data into assembled sequencing must continuously be revised to keep pace with the changing nature of the data. This paper builds upon current methods from the emerging field of network science to develop a new way of analyzing and correcting sequencing data. We show that our algorithm is both more robust to erroneous data, and more accurate overall, compared to current techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving a multi-objective mixed-model assembly line balancing and sequencing problem

This research addresses the mixed-model assembly line (MMAL) by considering various constraints. In MMALs, several types of products which their similarity is so high are made on an assembly line. As a consequence, it is possible to assemble and make several types of products simultaneously without spending any additional time. The proposed multi-objective model considers the balancing and sequ...

متن کامل

Network Algorithms for Complex Systems with Applications to Non-linear Oscillators and Genome Assembly

Title of dissertation: NETWORK ALGORITHMS FOR COMPLEX SYSTEMS WITH APPLICATIONS TO NON-LINEAR OSCILLATORS AND GENOME ASSEMBLY Karl R. B. Schmitt, Doctor of Philosophy, 2013 Dissertation directed by: Assistant Professor Michelle Girvan Department of Physics & Dr. Aleksey Zimin Institute for Physical Science and Technology Network and complex system models are useful for studying a wide range of ...

متن کامل

Solving a Multi-Item Supply Chain Network Problem by Three Meta-heuristic Algorithms

The supply chain network design not only assists organizations production process (e.g.,plan, control and execute a product’s flow) but also ensure what is the growing need for companies in a longterm. This paper develops a three-echelon supply chain network problem including multiple plants, multiple distributors, and multiple retailers with amulti-mode demand satisfaction policy inside of pro...

متن کامل

An Analytical Approach for Single and Mixed-Model Assembly Line Rebalancing and Worker Assignment Problem

In this paper, an analytical approach is used for assembly line rebalancing and worker assignment for single and mixed-model assembly lines based on a heuristic-simulation algorithm. This approach helps to managers to select a better marketing strategy when different combinations of demands are suitable.Furthermore, they can use it as a guideline to know which worker assignment is better for ea...

متن کامل

A novel bi-level stochastic programming model for supply chain network design with assembly line balancing under demand uncertainty

This paper investigates the integration of strategic and tactical decisions in the supply chain network design (SCND) considering assembly line balancing (ALB) under demand uncertainty. Due to the decentralized decisions, a novel bi-level stochastic programming (BLSP) model has been developed in which SCND problem has been considered in the upper-level model, while the lower-level model contain...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013